376 results found.
Written
Corpus,
Language Type:
Monolingual
Languages:
Spanish
Availability:
Freely Available
License:
Size:
None Production Status:
Existing-used
Use:
Named Entity Recognition
-
Paper title:Closing the Gap: Joint De-Identification and Concept Extraction in the Clinical Domain
-
Paper track:Short/NLP Applications
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Lukas Lange | PharmaCoNER corpus | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Czech English French German Spanish Swedish
Availability:
Freely Available
License:
CreativeCommons
Size:
7 GByte Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
-
Paper title:Document Translation vs. Query Translation for Cross-Lingual Information Retrieval in the Medical Domain
-
Paper track:Long/Information Retrieval and Text Mining
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Shadi Saleh | Extended CLEF eHealth 2013-2015 IR Test Collection | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Czech English French German Hungarian Polish Spanish Swedish
Availability:
Freely Available
License:
CreativeCommons
Size:
2 MByte Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
-
Paper title:Document Translation vs. Query Translation for Cross-Lingual Information Retrieval in the Medical Domain
-
Paper track:Long/Information Retrieval and Text Mining
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Shadi Saleh | Khresmoi Summary Translation Test Data 2.0 | /N |
Documentation:
None
Written
Lexicon,
Language Type:
Multilingual
Languages:
English French German Italian Spanish
Availability:
Freely Available
License:
Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Size:
None Production Status:
Newly created-in progress
Use:
Word Sense Disambiguation
-
Paper title:Clu{BERT}: {A} Cluster-Based Approach for Learning Sense Distributions in Multiple Languages
-
Paper track:Long/Semantics: Lexical
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Bianca Scarlini | CluBERT Distributions | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Bulgarian Croatian Czech Danish Dutch English Estonian Finnish French German Greek Hungarian Icelandic Irish Italian Latvian Lithuanian Maltese Polish Portuguese Romanian Slovak Slovenian Spanish Swedish
Availability:
Freely Available
License:
CC-0
Size:
341856530 sentences Production Status:
Newly created-in progress
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:ParaCrawl: Web-Scale Acquisition of Parallel Corpora
-
Paper track:Long/Resources and Evaluation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Philipp Koehn | ParaCrawl | /N |
Documentation:
None
Written
Treebank,
Language Type:
Multilingual
Languages:
Chinese English French German Italian Japanese Russian Spanish
Availability:
Freely Available
License:
CreativeCommons
Size:
None Production Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries
-
Paper track:Short/Machine Learning for NLP
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mozhi Zhang | Universal Dependencies | /N |
Documentation:
None
Written
Evaluation Data,
Language Type:
Multilingual
Languages:
Chinese English French German Italian Japanese Russian Spanish
Availability:
From NIST
License:
Size:
None Production Status:
Existing-used
Use:
Document Classification, Text categorisation
-
Paper title:Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries
-
Paper track:Short/Machine Learning for NLP
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mozhi Zhang | Reuters RCV1/RCV2 Multilingual Corpus | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
English French Italian Spanish
Availability:
Freely Available
License:
CreativeCommons BY NC ND 4.0 International
Size:
3370 <audio-transcript-translation> triplets OtherProduction Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus
-
Paper track:Long/Machine Translation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Marco Turchi | MuST-SHE | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic Chinese English German Hindi Spanish Vietnamese
Availability:
Freely Available
License:
Size:
50+ GByte Production Status:
Existing-used
Use:
Machine Learning
-
Paper title:MLQA: Evaluating Cross-lingual Extractive Question Answering
-
Paper track:Long/Question Answering
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Patrick Lewis | Wikipedia | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Dutch English French German Italian Polish Portuguese Spanish
Availability:
Freely Available
License:
CC BY 4.0
Size:
None Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
-
Paper title:LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
-
Paper track:8.1 Feature extraction and low-level feature model/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Laurent Besacier | Multilingual LibriSpeech (MLS) | /N |
Documentation:
https://arxiv.org/abs/2012.03411, English, public




